Skip to content

Check test cases with measurements #2161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

Kobzol
Copy link
Contributor

@Kobzol Kobzol commented Jun 17, 2025

With the new design, it will be possible to backfill results into the DB, for example if you ask on a PR that you want to see results for the cranelift backend (which is not benchmarked by default), the collector will go back and actually backfill cranelift backend data for the parent master commit.

To support that, we need to expand the notion of a benchmark being "done". Right now, we record a (artifact, benchmark_name) tuple into the DB (called a step) when a benchmark begins, and then if we ever encounter the same tuple again, we don't benchmark it again. That's not ideal, because if an error happened and no data was generated, you won't be able to retry the collection without removing everything for the given artifact from the DB. And mainly, you cannot backfill more results (e.g. by running only Debug first, and then backfilling Opt, which is useful also for local experiments).

This PR expands the concept of a benchmark being done by actually checking which compile-time test cases are present in the DB. We cheat a bit to have better perf - if there is at least one recorded statistic in the DB for a given test case, we consider it to be done (so we essentially ignore missing iterations, but that should be a niche edge case).

Even though this logic is mostly useful for the new scheme, which is not implemented yet, I decided to also implement it for the current benchmarking logic, because it's useful for local experiments.

Best reviewed commit by commit.

@Kobzol Kobzol requested a review from Mark-Simulacrum June 17, 2025 15:56
@Kobzol Kobzol force-pushed the check-test-cases-with-measurements branch from 4aa85e6 to 6e4c94c Compare June 18, 2025 07:51
@Kobzol Kobzol force-pushed the check-test-cases-with-measurements branch from 6e4c94c to b4ebd59 Compare June 20, 2025 12:05
@Mark-Simulacrum
Copy link
Member

Haven't looked at code yet, but can you clarify how we avoid trying to benchmark e.g. 1000s of artifacts when a new benchmark or new scenario is added? In other words, IIUC the backfill being referenced shouldn't always apply, right? (At least not on the official collector)?

@Kobzol
Copy link
Contributor Author

Kobzol commented Jun 20, 2025

It's a bit complicated 😅 Details are in https://hackmd.io/wq30YNEIQMSFLWWcWDSI9A. The TLDR is:

  1. We have a permanent table (benchmark_request) that stores all requests for benchmarking artifacts. Once a benchmark is completed there, we mark it as completed, and it will never be resubmitted to a collector again (thus we won't auto backfill old PRs when benchmark parameters are added).
  2. We have a transient table (job_queue), which contains specific jobs for a certain collector machine to benchmark. When a try build needs to backfill certain benchmark parameters, it will also create jobs for the master parent commit that will backfill the missing results into the DB.

Note that this PR does not actually allow backfill with the old system, it should work the same as before. It just allows backfill in the future with the new system, while being compatible with both systems.

.into_iter()
.map(|test_case| (*scenario, test_case))
})
.filter(|(_, test_case)| !already_computed.contains(test_case))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this condition be more complex? For example, if we add a new incr-patched scenario, we can't run the build for that without going through incr-clean (at least) - and probably the preceding patches, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have an extra patch scenario, if scenarios.iter().any(|s| s.is_incr()) { will take care of re-running the base IncrFull scenario. We might then have an additional data point for it, but that shouldn't matter much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to handle the preceding patches though, good point.

backend: &CodegenBackend,
target: &Target,
) -> Vec<CompileTestCase> {
self.patches
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we expecting this to get more complicated later? It seems a bit wasteful (I guess fine in practice...) to produce a relatively large vec and then dedup most of it away.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the vec/set dedup and sorting.

@Kobzol Kobzol force-pushed the check-test-cases-with-measurements branch from 685a1ff to d424fa1 Compare June 23, 2025 07:52
@Kobzol
Copy link
Contributor Author

Kobzol commented Jun 23, 2025

Thanks for the comments, I realized that we need to handle incr-patched in a more complicated way. But this also allowed me to simplify the whole check and remove the set/vec sorting, it was not needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants